{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "### 主要记载了机器学习中常用的数学距离"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "欧几里得距离或欧几里得度量是欧几里得空间中两点间的即直线距离。使用这个距离，欧氏空间成为度量空间，相关联的范数称为欧几里得范数。"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "EuclideanDistance(1, 5)： 4.0\n",
      "EuclideanDistance([1, 5], [4, 7])： 3.605551275463989\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "def EuclideanDistance(x, y):\n",
    "    x = np.array(x)\n",
    "    y = np.array(y)\n",
    "    return np.sqrt(np.sum(np.square(x - y)))\n",
    "\n",
    "print('EuclideanDistance(1, 5)：', EuclideanDistance(1, 5))\n",
    "print('EuclideanDistance([1, 5], [4, 7])：', EuclideanDistance([1, 5], [4, 7]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "曼哈顿距离是种使用在几何度量空间的几何学用语，用以标明两个点在标准坐标系上的绝对轴距总和"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ManhattanDistance(1, 5)： 4\n",
      "ManhattanDistance([1, 5], [4, 7])： 5\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "def ManhattanDistance(x, y):\n",
    "    x = np.array(x)\n",
    "    y = np.array(y)\n",
    "    return np.sum(np.abs(x - y))\n",
    "\n",
    "print('ManhattanDistance(1, 5)：', ManhattanDistance(1, 5))\n",
    "print('ManhattanDistance([1, 5], [4, 7])：', ManhattanDistance([1, 5], [4, 7]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "闵可夫斯基距离不是一种距离，而是一组距离的定义，是对多个距离度量公式的概括性的表述，它包含了欧几里得距离p=2，曼哈顿距离:p=1"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MinkowskiDistance([1, 5], [4, 7],2)： 3.605551275463989\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import math\n",
    "def MinkowskiDistance(x, y, p):\n",
    "    zipped_coordinate = zip(x, y)\n",
    "    return math.pow(np.sum([math.pow(np.abs(i[0] - i[1]), p) for i in zipped_coordinate]), 1 / p)\n",
    "\n",
    "print('MinkowskiDistance([1, 5], [4, 7],2)：', MinkowskiDistance([1, 5], [4, 7],2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "切比雪夫距离（Chebyshev Distance）为$L\\infty$度量，是向量空间中的一种度量，二个点之间的距离定义是其各坐标数值差绝对值的最大值。以数学的观点来看，切比雪夫距离是由一致范数（或称为上确界范数）所衍生的度量，也是超凸度量的一种"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ChebyshevDistance： 3\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "def ChebyshevDistance(x, y):\n",
    "    x = np.array(x)\n",
    "    y = np.array(y)\n",
    "    return np.max(np.abs(x - y))\n",
    "\n",
    "\n",
    "print('ChebyshevDistance：', ChebyshevDistance([1, 5], [4, 7]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "标准化的欧几里得距离是针对简单欧几里得距离的缺点而作的一种改进方案\n",
    "思路为：将各个分量都“标准化”到均值、方差相等的区间，即 $X^* = \\frac{X - m}{s}$\n",
    "其中$X^*$为标准化后的值，X为原值，m为分量的均值，s为分量的标准差"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "StandardizedEuclideanDistance：  2.0\n",
      "StandardizedEuclideanDistance：  1.4142135623730951\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "def StandardizedEuclideanDistance(x, y):\n",
    "    x = np.array(x)\n",
    "    y = np.array(y)\n",
    "\n",
    "    X = np.vstack([x, y])\n",
    "    sigma = np.var(X, axis=0, ddof=1)\n",
    "    return np.sqrt(((x - y) ** 2 / sigma).sum())\n",
    "\n",
    "\n",
    "print('StandardizedEuclideanDistance： ', StandardizedEuclideanDistance([1, 5], [4, 7]))\n",
    "print('StandardizedEuclideanDistance： ', StandardizedEuclideanDistance(2, 4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "马氏距离（Mahalanobis Distance）是由印度统计学家马哈拉诺比斯(P. C. Mahalanobis)提出的，表示数据的协方差距离。它是一种有效的计算两个未知样本集的相似度的方法。与欧氏距离不同的是它考虑到各种特性之间的联系并且是尺度无关的，即独立于测量尺度。对于一个均值为μ，协方差矩阵为$Σ \\Sigma$的多变量向量，其n维空间中的马氏距离为：$d(x,y) = \\sqrt{(x-y)^{T} \\sum^{-1} (x-y)}$若协方差矩阵为单位矩阵，那么马氏距离就简化为欧几里得距离（Euclidean Distance）。若协方差矩阵为对角阵，则其转为标准化的欧几里得距离（Standardized Euclidean Distance）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "ename": "LinAlgError",
     "evalue": "Singular matrix",
     "output_type": "error",
     "traceback": [
      "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m",
      "\u001B[1;31mLinAlgError\u001B[0m                               Traceback (most recent call last)",
      "\u001B[1;32m<ipython-input-14-0bc3ecec26af>\u001B[0m in \u001B[0;36m<module>\u001B[1;34m\u001B[0m\n\u001B[0;32m     20\u001B[0m \u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0;32m     21\u001B[0m     \u001B[1;32mreturn\u001B[0m \u001B[0md1\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[1;32m---> 22\u001B[1;33m \u001B[0mprint\u001B[0m\u001B[1;33m(\u001B[0m\u001B[1;34m'MahalanobisDistance： '\u001B[0m\u001B[1;33m,\u001B[0m \u001B[0mMahalanobisDistance\u001B[0m\u001B[1;33m(\u001B[0m\u001B[1;33m[\u001B[0m\u001B[1;36m1\u001B[0m\u001B[1;33m,\u001B[0m \u001B[1;36m5\u001B[0m\u001B[1;33m]\u001B[0m\u001B[1;33m,\u001B[0m \u001B[1;33m[\u001B[0m\u001B[1;36m4\u001B[0m\u001B[1;33m,\u001B[0m \u001B[1;36m7\u001B[0m\u001B[1;33m]\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0m\u001B[0;32m     23\u001B[0m \u001B[0mprint\u001B[0m\u001B[1;33m(\u001B[0m\u001B[1;34m'MahalanobisDistance： '\u001B[0m\u001B[1;33m,\u001B[0m \u001B[0mMahalanobisDistance\u001B[0m\u001B[1;33m(\u001B[0m\u001B[1;36m2\u001B[0m\u001B[1;33m,\u001B[0m \u001B[1;36m4\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n",
      "\u001B[1;32m<ipython-input-14-0bc3ecec26af>\u001B[0m in \u001B[0;36mMahalanobisDistance\u001B[1;34m(x, y)\u001B[0m\n\u001B[0;32m     10\u001B[0m     \u001B[0mX_T\u001B[0m \u001B[1;33m=\u001B[0m \u001B[0mX\u001B[0m\u001B[1;33m.\u001B[0m\u001B[0mT\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0;32m     11\u001B[0m     \u001B[0msigma\u001B[0m \u001B[1;33m=\u001B[0m \u001B[0mnp\u001B[0m\u001B[1;33m.\u001B[0m\u001B[0mcov\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0mX\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[1;32m---> 12\u001B[1;33m     \u001B[0msigma_inverse\u001B[0m \u001B[1;33m=\u001B[0m \u001B[0mnp\u001B[0m\u001B[1;33m.\u001B[0m\u001B[0mlinalg\u001B[0m\u001B[1;33m.\u001B[0m\u001B[0minv\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0msigma\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0m\u001B[0;32m     13\u001B[0m \u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0;32m     14\u001B[0m     \u001B[0md1\u001B[0m\u001B[1;33m=\u001B[0m\u001B[1;33m[\u001B[0m\u001B[1;33m]\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n",
      "\u001B[1;32m<__array_function__ internals>\u001B[0m in \u001B[0;36minv\u001B[1;34m(*args, **kwargs)\u001B[0m\n",
      "\u001B[1;32mE:\\Anaconda3\\lib\\site-packages\\numpy\\linalg\\linalg.py\u001B[0m in \u001B[0;36minv\u001B[1;34m(a)\u001B[0m\n\u001B[0;32m    543\u001B[0m     \u001B[0msignature\u001B[0m \u001B[1;33m=\u001B[0m \u001B[1;34m'D->D'\u001B[0m \u001B[1;32mif\u001B[0m \u001B[0misComplexType\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0mt\u001B[0m\u001B[1;33m)\u001B[0m \u001B[1;32melse\u001B[0m \u001B[1;34m'd->d'\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0;32m    544\u001B[0m     \u001B[0mextobj\u001B[0m \u001B[1;33m=\u001B[0m \u001B[0mget_linalg_error_extobj\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0m_raise_linalgerror_singular\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[1;32m--> 545\u001B[1;33m     \u001B[0mainv\u001B[0m \u001B[1;33m=\u001B[0m \u001B[0m_umath_linalg\u001B[0m\u001B[1;33m.\u001B[0m\u001B[0minv\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0ma\u001B[0m\u001B[1;33m,\u001B[0m \u001B[0msignature\u001B[0m\u001B[1;33m=\u001B[0m\u001B[0msignature\u001B[0m\u001B[1;33m,\u001B[0m \u001B[0mextobj\u001B[0m\u001B[1;33m=\u001B[0m\u001B[0mextobj\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0m\u001B[0;32m    546\u001B[0m     \u001B[1;32mreturn\u001B[0m \u001B[0mwrap\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0mainv\u001B[0m\u001B[1;33m.\u001B[0m\u001B[0mastype\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0mresult_t\u001B[0m\u001B[1;33m,\u001B[0m \u001B[0mcopy\u001B[0m\u001B[1;33m=\u001B[0m\u001B[1;32mFalse\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0;32m    547\u001B[0m \u001B[1;33m\u001B[0m\u001B[0m\n",
      "\u001B[1;32mE:\\Anaconda3\\lib\\site-packages\\numpy\\linalg\\linalg.py\u001B[0m in \u001B[0;36m_raise_linalgerror_singular\u001B[1;34m(err, flag)\u001B[0m\n\u001B[0;32m     86\u001B[0m \u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0;32m     87\u001B[0m \u001B[1;32mdef\u001B[0m \u001B[0m_raise_linalgerror_singular\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0merr\u001B[0m\u001B[1;33m,\u001B[0m \u001B[0mflag\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m:\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[1;32m---> 88\u001B[1;33m     \u001B[1;32mraise\u001B[0m \u001B[0mLinAlgError\u001B[0m\u001B[1;33m(\u001B[0m\u001B[1;34m\"Singular matrix\"\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0m\u001B[0;32m     89\u001B[0m \u001B[1;33m\u001B[0m\u001B[0m\n\u001B[0;32m     90\u001B[0m \u001B[1;32mdef\u001B[0m \u001B[0m_raise_linalgerror_nonposdef\u001B[0m\u001B[1;33m(\u001B[0m\u001B[0merr\u001B[0m\u001B[1;33m,\u001B[0m \u001B[0mflag\u001B[0m\u001B[1;33m)\u001B[0m\u001B[1;33m:\u001B[0m\u001B[1;33m\u001B[0m\u001B[1;33m\u001B[0m\u001B[0m\n",
      "\u001B[1;31mLinAlgError\u001B[0m: Singular matrix"
     ]
    }
   ],
   "source": [
    "def MahalanobisDistance(x, y):\n",
    "    '''\n",
    "    马氏居立中的(x,y)与欧几里得距离的(x,y)不同,欧几里得距离中的(x,y)指2个样本，每个样本的维数为x或y的维数；这里的(x,y)指向量是2维的，样本个数为x或y的维数，若要计算n维变量间的马氏距离则需要改变输入的参数如(x,y,z)为3维变量。\n",
    "    '''\n",
    "    import numpy as np\n",
    "    x = np.array(x)\n",
    "    y = np.array(y)\n",
    "    \n",
    "    X = np.vstack([x,y])\n",
    "    X_T = X.T\n",
    "    sigma = np.cov(X)\n",
    "    sigma_inverse = np.linalg.inv(sigma)\n",
    "    \n",
    "    d1=[]\n",
    "    for i in range(0, X_T.shape[0]):\n",
    "        for j in range(i+1, X_T.shape[0]):\n",
    "            delta = X_T[i] - X_T[j]\n",
    "            d = np.sqrt(np.dot(np.dot(delta,sigma_inverse),delta.T))\n",
    "            d1.append(d)\n",
    "        \n",
    "    return d1\n",
    "print('MahalanobisDistance： ', MahalanobisDistance([1, 5], [4, 7]))\n",
    "print('MahalanobisDistance： ', MahalanobisDistance(2, 4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
